Lost in Space: Geolocation in Event Data
نویسندگان
چکیده
Extracting the “correct” location information from text data, i.e., determining the place of event, has long been a goal for automated text processing. To approximate human-like coding schema, we introduce a supervised machine learning algorithm that classifies each location word to be either correct or incorrect. We use news articles collected from around the world (Integrated Crisis Early Warning System [ICEWS] data and Open Event Data Alliance [OEDA] data) to test our algorithm that consists of two stages. In the feature selection stage, we extract contextual information from texts, namely, the N-gram patterns for location words, the frequency of mention, and the context of the sentences containing location words. In the classification stage, we use three classifiers to estimate the model parameters in the training set and then to predict whether a location word in the test set news articles is the place of the event. The validation results show that our algorithm improves the accuracy rate of the current geolocation methods of dictionary approach by as much as 25%. Keyword: Natural Language Processing, Information Extraction, Text Analysis, Supervised Machine Learning, Geolocation, Event Data
منابع مشابه
Matrix Factorization for Near Real-time Geolocation Prediction in Twitter Stream
The geographical location is vital to geospatial applications such as event detection, geo-aware recommendation and local search. Previous research on this topic has investigated geolocation prediction framework via conducting pre-partitioning and applying classification methods. These existing approaches target user’s geolocation all at once via concatenation of tweets. In this paper, we study...
متن کاملHow Accurate is IP Geolocation Based on IP Allocation Data?
The Regional Internet Registries (RIRs) publish information about all allocated IPv4 and IPv6 address blocks including country codes for each block. This information can be used for IP address geolocation. However, since large IP address blocks assigned to one country may belong to large international organisations spread over multiple countries the accuracy is questionable. With data from May ...
متن کاملOn the Accuracy of IP Geolocation Based on IP Allocation Data
The Regional Internet Registries (RIRs) publish information about all allocated IPv4 and IPv6 address blocks including country codes for each block. This information can be used for IP address geolocation. However, since large IP address blocks assigned to one country may belong to large international organisations spread over multiple countries the accuracy is questionable. With data from May ...
متن کاملA Support Platform for Event Detection using Social Intelligence
This paper describes a system designed to support event detection over Twitter. The system operates by querying the data stream with a user-specified set of keywords, filtering out non-English messages, and probabilistically geolocating each message. The user can dynamically set a probability threshold over the geolocation predictions, and also the time interval to present data for.
متن کاملLost Space Renewal; a Reborn of an Urban Water Body
Due to rapid growth of urbanization and economic demand, we are continuously losing our fields,our free lands, open sky, ponds, lakes; actually our breathing spaces. Sometimes for zoning policies, migration ortransferal of a particular business or activity, a place like waterfronts, body of water, military or industrial sites can lostits importance, kept vacant and become a dead place. These ar...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1611.04837 شماره
صفحات -
تاریخ انتشار 2016